Preferential survival in models of complex ad hoc networks 
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There has been a rich interplay in recent years between (i) empirical investigations of real world 
dynamic networks, (ii) analytical modeling of the microscopic mechanisms that drive the emergence 
of such networks, and (iii) harnessing of these mechanisms to either manipulate existing networks, 
or engineer new networks for specific tasks. We continue in this vein, and study the deletion 
phenomenon in the web by following two different sets of web-sites (each comprising more than 
150,000 pages) over a one-year period. Empirical data show that there is a significant deletion 
component in the underlying web networks, but the deletion process is not uniform. This motivates 
us to introduce a new mechanism of preferential survival (PS), where nodes are removed according to 
the degree-dependent deletion kernel, D(k) oc k~ a , with a > 0. We use the mean-field rate equation 
approach to study a general dynamic model driven by Preferential Attachment (PA), Double PA 
(DPA), and a tunable PS (i.e., with any a > 0), where c nodes (c < 1) are deleted per node added 
to the network, and verify our predictions via large-scale simulations. One of our results shows 
that, unlike in the case of uniform deletion (i.e., where a = 0), the PS kernel when coupled with 
the standard PA mechanism, can lead to heavy-tailed power law networks even in the presence of 
extreme turnover in the network. Moreover, a weak DPA mechanism, coupled with PS, can help 
make the network even more heavy-tailed, especially in the limit when deletion and insertion rates 
are almost equal, and the overall network growth is minimal. The dynamics reported in this work 
can be used to design and engineer stable ad hoc networks and explain the stability of the power 
law exponents observed in real-world networks. 
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I. INTRODUCTION 
A. Motivation and Background 

The empirical study of real- world networks such as the 
World Wide Web, the movie actor collaboration network, 
and scientific citation network has attracted considerable 
interest. Such large-scale and complex networks have 
been treated as physical systems, and stochastic models 
based on randomized mechanisms or protocols have been 
developed to model and explain empirically observed net- 
work characteristics. Concomitantly, several works have 
shown that the network dynamic models have applica- 
tions beyond merely modeling real-world systems: It has 
been shown that randomized protocols can be used to 
design and engineer systemSjWithpeer-to-peer networks 
being the primary example 0, H, 0, IE 0] • O ne of the 
motivations of this work is to continue such efforts aimed 
at discovering new mechanisms that play an important 
role in organic real-world networks, and that might be 
useful in designing engineered networks and protocols. 

Well-known examples of such data-inspired dynamic 
models, include preferential attachment (PA) and its 
variants 0, d, Q, copying [l(| EH, E3, PA with fitness 
[HI, EH EH ! double preferential attachment of links [l6| 
and the rewiring of links (l7| . These mechanisms, how- 
ever, model the dynamics of a growing network, where 
the effect of node deletion is not considered significant. 
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Many real-world networks experience significant rates of 
node deletions. For example, nodes join and depart from 
peer-to-peer networks in a random and rapid manner, 
and movie actors end their careers, effectively removing 
themselves from collaboration networks. Hence, devel- 
oping a network dynamic model for the class of ad hoc 
networks with a significant deletion component is impor- 
tant. 

Several recently proposed models have addressed the 
node deletion process 0, El, El, However, these 

works take an egalitarian approach in modeling the dele- 
tion process as uniform node failure. The uniform dele- 
tion model fails to account for the heterogeneity of the 
nodes' abilities to compete for survival, or participate 
for varying periods of time in a network. Interestingly, 
these uniform deletion models predict that a network's 
power law (PL) degree distribution, a signature of sev- 
eral real-world networks such as the Web, will disappear 
as the deletion rate becomes more significant when the 
primary mechanism driving the network formation is the 
PA rule 0, El, H3|- In order to retain a heavy-tailed 
distribution (i.e., networks with PL exponent, 7, being 
less than 3 and closer to 2) the vanilla PA mechanism 
has to be augmented with a dominant second mecha- 
nism that initiates new edges from the existing nodes, 
such as a distributed compensation mechanism as intro- 
duced in Q , or a Double Preferential Attachment (DPA) 
mechanism (see Section [ill Bl and [HI Ell)- It is not clear 
whether organic networks with high deletion rates nat- 
urally and inherently possess such compensatory mech- 
anisms to retain their empirically-observed heavy-tailed 
distributions. Moreover, while in an engineered network 
one might be able to enforce such stabilizing protocols 
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(in order to retain the advantages accrued from the nat- 
ural hierarchy present in a heavy-tailed network it 
might be too expensive to do so or might be too difficult 
to enforce, and alternative mechanisms that can stabilize 
the network structure might be needed. 

In the absence of any empirical studies on the node 
removal process of real- world networks, the simple uni- 
form deletion model is a reasonable assumption to work 
with. Wc, however, ask: Is it possible to empirically 
study the node deletion process of an organically grown 
network and quantify its characteristics? We turn to the 
Web, which has proved to be a treasure trove for mech- 
anism and modeling sleuths. Recent empirical studies of 
the Web suggest that the current Web environment is 
extremely dynamic. For example, Ntoulas et al. found 
that 20% of the web pages in their large data set is per- 
manently removed in just 1 month and 50% of the web 
pages are deleted in 9 months [2l| ■ Similar findings on the 
short lifetime of web pages are reported in [22|, [23| • These 
works, while they categorically establish that deletions of 
nodes is a significant event and should be included in any 
dynamic modeling of the web, they do not answer the na- 
ture of the deletion dynamics, and whether it is uniform 
or not. 



B. Summary of Results 

In a competitive network such as the Web, we expect 
the nodes to compete for survival in addition to com- 
peting for links. A webpage's degree is a good approx- 
imation to its ability to compete, since heavily linked 
Web documents are entitled to numerous benefits, such 
as being possibly ranked higher in search engine results, 
and attracting higher traffic and, thus, higher revenue 
through online advertisements. As a result, we conjec- 
ture the mechanism of preferential survival (PS), whereby 
each node's chance of survival increases with its degree; 
in other words, pages with higher degrees would be less 
likely to be deleted than their counter parts with lower 
degrees. 

In order to verify our conjecture, we made a longitu- 
dinal study of Web data, where we followed two differ- 
ent sets of web-sites (each comprising more than 150, 000 
pages) over a period of one year, as described in Sec. |TT] 
|28| . We found that indeed there exists a significant rate 
of node deletion in the crawl data we studied. The dele- 
tion rates c (the average number of nodes deleted per 
node that is added to the network, i.e., the network grows 
at the rate of (1 — c)) for the sites we tracked are observed 
to be as high as 0.9 (see the Appendix for further de- 
tails). We next developed a method to quantify the node 
deletion kernel, and found that the conjectured PS mech- 
anism is indeed in play and that the degree-dependent 
deletion kernel (i.e., the probability that a node of degree 
k is deleted at any time step) behaves as, D(k) oc k~ a 
(a > 0), where a is estimated to be 1.0 for our crawl data. 
Interestingly, given the high rate of node deletion rates in 



our crawl data, we found no sign of the disappearance of 
the power law degree distribution. 

The empirical findings motivated us to study the role 
of preferential survival mechanism in the well-studied 
stochastic PA and DPA models; see Sec. IIIII That is, at 
every time step, in addition to adding a new node that 
initiates preferential edges, an existing node is chosen ac- 
cording to the PS deletion kernel, D(k) oc fc _Q (a > 0), 
and this node (along with all of its edges) is then deleted 
with probability c. Thus, for a — it reduces to the 
already studied case of uniform deletion where nodes are 
deleted at the rate of c. Otherwise, as a increases, the dy- 
namic shields higher degree nodes against deletion, even 
though the overall deletion rate remains fixed at c. The 
main predictions of our analysis can be summarized as 
follows (all analytical results are verified by large-scale 
simulations): 

1 . In the special case of PA and only PS with a = 1 , 
our analysis shows that the power law exponent is 
expected to be 7 = 3 for any turnover rate c be- 
tween and 1. Our large-scale simulation results 
are in good agreement with our analysis. Thus, 
the PS mechanism by itself can arrest the diver- 
gence of the PL exponent predicted for the uniform 
deletion case (i.e., a = 0). Furthermore, we analyt- 
ically derive the node lifetime distribution for the 
preceding case of PS (with a = 1.0) and PA, and 
find that the probability a given node survives for t 
time steps, converges to a constant as t grows. The 
analytical distribution closely matches the empiri- 
cal distribution of lifetimes in our crawl, providing 
further credence to the model. 

2. As a comparison, when we study the case of uni- 
form deletion and DPA, we find that in order for 
the PL exponent to be stabilized at around 3 for 
high rates of deletions (i.e., when c « 1), the num- 
ber of doubly preferential (DP) edges have to be 
increased significantly, i.e., if at every time step, 
each incoming node brings in m edges on the aver- 
age, then the existing nodes have to initiate bm DP 
edges at every time step, where b >> 1. Thus, one 
needs a very strong DPA component to compensate 
for a uniform deletion case. 

3. In the case of both PS and DPA, we show that the 
power law exponent actually decreases as the net- 
work experiences higher turnover rate. Thus, when 
used in conjunction with the PS dynamic, even a 
weak DPA mechanism (i.e. for example for the em- 
pirically estimated values of m = 10 and b = 0.1) 
can be critical in driving the power law exponent 
close to 2 even in the face of extremely high rate of 
turnovers. 

Although the PS mechanism is inspired by empirical Web 
dynamics, a complete model of the Web should take into 
account other factors such as the nodes' varying fitness 
in attracting links [IJ, [HI, and such a modeling effort 
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is beyond the scope of this paper. Moreover, while dele- 
tion is dominated by the PS mechanism for the two sets 
of crawls studied in this paper, further work studying a 
larger web sample is needed to justify a general conclu- 
sion that PS is a dominant mechanism for all parts or a 
majority of the web. 

However, as our analysis and simulations indicate, the 
PS dynamic when incorporated into the PA and DPA 
models leads to a stable structure and can be used to 
model and design protocols to engineer large scale com- 
plex networks. Potential implications of the PS mecha- 
nism for both modeling and designing real- world network 
application purposes are discussed further in Section [TV] 



II. EMPIRICAL MEASUREMENTS 

A. The Dataset 

Our dataset of the World Wide Web was obtained from 
the Stanford WebBase project [29]. We randomly se- 
lected a set of web hosts, comprising of roughly 170 thou- 
sand pages, and tracked their evolution monthly for the 
year 2006. This dataset is denoted as SET1. In order 
to further validate our results, we sampled another set 
of web hosts comprising of roughly 150 thousand pages 
and tracked their evolution for the same period. For 
this dataset, more than 99% of the nodes belong to the 
weakly giant connected component (i.e., when we look at 
all edges as undirected) over the examined period. This 
dataset is denoted as SET2. The nodes in The WebBase 
crawler would extract a maximum of 10 thousand pages 
per host. However, the 10 thousand pages per host limit 
is not a problem since none of the tracked hosts reaches 
this limit. 



B. Evidence of Preferential Survival of Webpages 

We mined our Web dataset for direct empirical evi- 
dence of the preferential survival mechanism. We regard 
the web graph as an undirected network and investigate 
the degree distribution of the set of deleted nodes in a 
given month (i.e. the set of nodes that are alive in a 
given month but disappear in the following month). If 
nodes were to be deleted uniformly randomly, the degree 
distribution of the set of deleted nodes would be identi- 
cal to the network's degree distribution. For SET1, we 
found that the power law exponent of the degree distri- 
bution of the set of deleted nodes to be jdei ~ 3.0 (Fig. 
1 (b) ) , which is significantly different from the power law 

2.0 (Fig. [1(a)). 



exponent for the entire network 7 rj 2.0 (Fig. [1(a) I. We 
found similar results for SET2: the power law exponent of 
the degree distribution of the set of removed nodes to be 
Idei — 3-2 (Fig. 1(d) I, which is significantly greater than 
the power law exponent for the entire network 7 = 2.2 

(Fig. nm. 
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FIG. 1: (a) SET1: The power law exponent of the degree 
distribution of the sampled Web for each month in 2006. The 
inset figure shows the degree distribution for May, 2006. (b) 
SET1: the power law exponent of the degree distribution of 
the set of removed nodes for different months in 2006. The 
inset figure shows the power law degree distribution for the set 
of webpages that are removed in March, 2006. (c) SET2: the 
degree distribution for June, 2006. The power law exponent is 
7 = 2.2. (d) SET2: the power law degree distribution for the 
set of webpages that are removed in June, 2006. The power 
law exponent is 7^; = 3.2. 
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Our finding from both SET1 and SET2 suggests that 
a node is removed according to the deletion probability 
kernel: D{k) oc k~ a , where a = jdei — 7 ~ 1-0 in our 
case. We will show in our model (see Sec. IIII A[) that a 
deletion kernel with a — 1 leads to the stabilization of 
the power law exponent at 7 = 3, for any turnover rate 
c between and 1. 



C. Resilience of the Power Law Exponent 

We tracked the PL exponent, 7, for our Web dataset 
with very high turnover rates (see Appendix) . The power 
law exponent does not show any sign of divergence and 
is highly resilient under high rate of turnover. For SET1, 
the exponent stays around 7 » 2.0 (Fig. 1(a) I; to be 
self-consistent, only edges linking the tracked pages are 
considered in estimating the degree distributions. For 
SET2, the exponent is around 7 ss 2.2 over the examined 
period. 



III. THE MODEL 

In order to study the implication of the preferential 
survival mechanism, we propose the following dynamic 
model: at each time step, a node joins the network and 
makes m links to m nodes preferentially; with probability 
c, a node is chosen to be removed, according to the dele- 
tion kernel D(k) oc k~~ a , along with all of its associated 
links; bm new internal edges link in a double preferential 
attachment (DPA) manner to existing nodes. The pa- 
rameter c denotes the turnover rate or the deletion rate, 
which is defined as the rate of node removal divided by 
the rate of node addition. 

Each node in the network is labeled by its insertion 
time. Let D(i,i) be the probability that the ith node 
is still in the network at time t, where t > i. Note that 
D(i, t) yields the lifetime distribution of node i. We have: 



D(i,t+1) = D(i,t)[l-c 



k(i,ty 



N(t)(k-«(t)y 

is D(i,i) = 1 and (k~ a (t)) 



(1) 



The initial condition 
^2 k k~ a P(k,t), which can be considered as the "-a" mo- 
ment of the degree distribution at time t (see Table U for 
the definition of symbols). 

Assuming the ith node is still in the network at time 
t, the evolution of its expected degree is described by the 
following equation: 



dk(i,t) 
dt 



k(i, t) 

m „,'., ck{i,t)P(& neighbor is removed) 



26m 



S(t) 
k(i,t) 



(2) 



Var. 


Definition 


k{i, t) 


expected degree of the ith node at time t 


S(t) 


sum of node degrees at time £ 


N(t) 


size of the network at time £ 


(k(t)> 


average node degree at time £ 




average degree of a deleted node at time £ 


(k-"(t)) 


Y, k k- a p(k,t) 


Const. 


Definition 


m 


number of connections of the joining node 


c 


turnover rate or number of nodes deleted in each time step 


b 


ratio of number of internal edges added per time step 
and number of connections per joining node 


a. 


exponent in the deletion kernel D(k) cx k~ a 


a 


the moment of the degree distribution: k~ 1 P(k) 



TABLE I: Table of Definitions 

degree at time t and N(t) = (1 — c)t is the number of 
nodes at time t. 

The initial condition is: k(i, i) = m. Eq. gives the 
rate at which the ith node gains connections at time t. 
The first term in Eq. @ describes the attachments of the 
m preferential links as a result of the joining node; the 
second term denotes the deletion of node i's neighbors 
according to the deletion kernel; the third term describes 
the appearance of bm new internal edges attaching in a 
double preferential manner to 26m target nodes. Fur- 
thermore, the evolution of S(t) is described by: 



dS(t) 
dt 



2{1 + b)m - 2c(k del {t)) 



(3) 



where (kd e i(t)) is the average degree of a deleted node at 
time t. 

Eq. ||3J) gives the rate of increase for the sum of node 
degrees at time t; the first term on the right hand side 
describes the addition of (1 + 6)m edges, hence 2(1 + b)m 
degrees are added to the sum of degrees; the second term 
describes the loss of edges as a result of the removed 
node. 

Now to calculate the power- law exponent, we note that 
No. of nodes with degree = k 



P(k,t) 



Total number of nodes 



1 



N(t) 



E ^(M) 



i:k(i,t) — k 



N(t) 



D(i,t) 



dk(i,t) 



01 



(4) 



i:k(i,t) 



The general model stated above appears to be very 
difficult to solve analytically. 



A. PA with PS (a > 0) 



where the sum of node degrees at time t is described by 
S(t) = (k(t))N(t), with (k(t)) denoting the average node 



We now consider the preferential survival model by 
setting the parameters a — 1.0 and 6 = 0, where the 
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FIG. 2: Power law exponent for the degree distribution of networks generated by simulation (m = 8, b = 0). At the time the 
snapshots are taken, the networks reach 20 000 nodes, (a) Preferential survival a = 1.0 (squares): the points do not deviate 
from 3 for c < 0.6 (see Eq. (USD- For c > 0.6, the simulation points deviate from 7 = 3 slightly due to finite number of time 
steps in simulations, (b) Preferential survival a = 0.5 (circles), a = 1.0 (squares), a = 1.5 (triangles): simulations results 
indicate that a greater a slows down the increase of the power law exponent. 



value of a is inspired by empirical measurements of the 
Web. We first note that 



P{ a node of degree k is deleted) 



N(t)a 



N(t)P(k,t) 

where ciq = ^2 k k 1 P{k) 1 under the assumption that 
a o{t) — Ylk k~ x P{k, t) converges rapidly to the station- 
ary value ao- This assumption has been verified numeri- 
cally. We thus obtain that (kd e i{t)) = l/txo- It is simple 
to show that the evolution of the sum of degrees at time 
t is given as: 



dS(t) 
dt 



2m- 2 — 

a 



We now obtain: S(t) = (2m — 2c/ao)t. 

Similarly, we invoke the assumption that (k(t)) = 
J2k kP(k, t) converges quickly to the stationary constant 
(k) and verified this assumption numerically. Now, as- 
suming the ith node's neighbors have the average degree 
(k), the evolution of the expected degree of the ith node 
at time t is described by the following equation after per- 
forming some calculations: 

dk(i,t) k(i,t) (fc)- 1 k(i,t) 

— ^ = m —. t ckti.t) „ T/ - — = — - — (7) 

dt S(t) v ' ' N(t)a It y ' 

The equation above implies that 



t 



k(i,t) = m{-f 



(8) 



where (3 = 1/2. 

After substituting Eq. (jHJl into Eq. ((T|), one can show 
that the D(i,t) equation is described by the following: 



D(i,t) 



o 2c (tA)~ 1/2 



o2c 



(9) 



where cq 



1 1 is interesting to note that D(i,t) 



(l-c)mdo ' 

has an initial exponential decay and converges to a posi- 
tive constant as t —> oo. 

We now invoke Eq. (fj| to get the stationary degree 
distribution: 



P(k) 



O co(fe/r. 



1 



(10) 



which has a power law tail with the exponent 7 = 3. 
This is the same power law exponent obtained for the 
simple preferential attachment model with no deletions. 
The analytical result is verified by large-scale simulations 



(6) (see Fig. 2(a) I. Thus, we found that preferential survival 



is the self-stabilization mechanism that nulls the harmful 
effect node deletion has on the power law exponent. Con- 
sequently, the power law exponent remains at 3 even in 
the face of node turnovers under the preferential survival 
mechanism. 

For the case of non-unity a, we resort to simulation 
studies: for a = 0.5 the divergence observed for the uni- 
form deletion case is checked, i.e., the PL exponent does 
not go to infinity as c approaches 1, but still can be much 
larger than 3; for a — 1.5 (i.e., the high-degree nodes 
are now being protected even more) , the distribution be- 
comes slightly more heavy tailed (see Fig. |2(b)[ ). 

Lifetime Distribution of Webpages. In addition to 
the degree distribution, we study the lifetime characteris- 
tics of webpages. In order to obtain the empirical lifetime 
distribution of webpages, we gathered and processed ad- 
ditional crawls from the period 2003 to 2005 from Web- 
Base (SET1). Our analytical model predicts that the 
probability a given node survives for / time steps, has an 
initial exponential decay, followed by a slow convergence 



3 Co/r , 

where cq and r are positive constants (see Eq. ©). In 



to a positive non-zero constant as I grows: P(l) 



6 



other words, a given node has a non-zero probability of 
achieving a very long, if not an eternal, life. 



where (3= 2( lZ$? +b y 

In addition, we solve the D(i,t) equation and obtain: 




D(i,t) = (£)*/<«-D 



(16) 



10 15 20 25 30 
Month 



FIG. 3: The figure plots the lifetime distribution of webpages 
from our sampled Web. The empirical distribution matches 
well with the analytical function predicted by our model. 



After invoking Eq. ((4]), the scaling relation for the 
power law exponent 7 can be derived as: 

-'^-^ <»> 

The analytical prediction matches closely with large-scale 
simulation results as shown in Fig. 
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From our Web 
dataset, the parameter b is estimated to be quite low 
at b « 0.1 (see Appendix). For such low value of b, 
the power law exponent is expected to diverge rapidly as 
the turnover rate c moves away from zero as shown in 
Eq. (fl~7|) . Thus, the emergence of the double preferential 
attachment edges is not sufficient to explain the observed 
resilience of the power law under high rate of turnover. 



Our model, thus, offers an explanation to the empirical 
observation that a significant fraction of webpages has 
short lifetimes, while some webpages persist for a very 
long time (Fig. [3]). Empirical webpage lifetime distribu- 
tion of similar form has also been obtained by another 
measurement study [21] , but no theoretical explanation 
has been offered. 



C. PA and DPA with PS 

We now investigate the preferential survival with dou- 
ble preferential attachment model by setting the param- 
eters a = 1.0 and for general b. It is simple to show 
that the evolution of S(t) is described by the following 
equation: 



B. PA and DPA with Uniform Deletion (a = 0) 

We now consider the double preferential attachment 
(DPA) with uniform deletion model by setting the pa- 
rameter a = 0. Using different methods, the same model 
has been analyzed in (TH, [l9[ . The evolution of the ex- 
pected degree of the node born in time step i at time t 
is described by the following equation: 

dk(i,t) k(i,t) k(i,t) k(i,t) 

= m ^w " c w + w ( } 

where N(i) = (1 — c)t, and S(t) is described by 
dS(t) „,„ „ „ S(t) „,„ ,, „ S(t) 

(12) 
(13) 

(14) 

(15) 



Solving Eq. (fl2"]). we get: 



Sit) = 2(1 + b)m- — -t 
1 + c 



Now, Eq. (jlip becomes: 



dk(i,t) 



k{i,t) ^l-c + 2b^ 



dt 2(1 -c)V 1 + 6 

Solving Eq. (fl"4|) . we get: 



t 



fc(M) = m(jf , 



S(t) = (2m(l + &) -2— )t 
a 



(18) 



As in the previous section, the assumption of the fast 
convergence of ciq and (fc) is numerically verified and 
used. The evolution of the expected degree of the ith 
node at time t is described by the following: 



dk(i,t) k(i,t) (fc)- 1 

— = m — — ck(i, t) — -- — 

dt S(t) V ' 7 A(t)a 

Solving the above equation, we get: 



k(i,t) = m(-) f3 



kii.t) , , 



(20) 



with 



1 bm 

f = T l+ {l + b)m-c/a > 



(21) 



where (k) is the average degree in the network. Finally, 
the power law exponent is given by: 



7 =l + i = l + (i(l + 



bm 



(1 + b)m — c/clq 



J)" 1 (22) 



Note that Eq. (|21[) is a strictly increasing function for 
< c < 1, assuming that do stays roughly constant for 
different value of c (this assumption has been numeri- 
cally verified). Thus, the power law exponent actually 
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FIG. 4: Power law exponent for the degree distribution of networks generated by simulation (m = 8). At the time the 
snapshots are taken, the networks reach 20 000 nodes. Double Preferential Attachment (DPA): Power law exponent for the 
degree distribution of networks generated with analytical models for uniform deletion (m — 8, a — 0): 6=1/8 (circles), 
6=1/2 (squares), 6=1 (stars). Note that tracking high power law exponent values much above 3.0 is rather difficult, since 
the distribution is rapidly decreasing and the power law region is typically exhibited over less than one decade. Hence, some 
data points for 6=1/8 are omitted. As shown, unless 6 is unreasonably high, DPA alone cannot stop the divergence of the PL 
exponent under heavy turnover, (b) Preferential survival with double preferential attachment edges (a = 1): 6=1/8 (squares) 
and 6=1 (circles) . The simulation results are in agreement with the theoretical prediction from Eq. (1221 1 . 



decreases as the turnover rate increases. After obtain- 
ing the value of ao numerically, the analytical results are 
verified by large-scale simulations (see Fig. |4(b)[ ). 

In summary, our model predicts the following: for 
a = 0, our model reduces to the case of uniform dele- 
tion, where the power law exponent is predicted to di- 
verge for moderate amount of DPA edges. In the case 
of PS with a = 1, our analysis shows that the power 
law exponent is expected to be 7 — 3 for any turnover 
rate c between and 1. Thus, the preferential survival 
mechanism by itself can prevent the divergence of the PL 
exponent predicted for the uniform failure case. Further- 
more, PS aided by a weak DPA dynamic can reinforce 
and stabilize the network's degree hierarchy even more 
as c approaches 1. 



IV. DISCUSSIONS 

Our work takes an important step in understanding a 
relatively unexplored class of networks: the class of ad 
hoc networks with significant rates of addition and dele- 
tion of nodes. To the best of our knowledge, we provide 
the first empirical study on the nature of deletion dynam- 
ics in complex networks. Using longitudinal Web crawl 
data that spanned the period of one year, we discovered 
the preferential survival mechanism and quantified its pa- 
rameter. In order to study the implication of the prefer- 
ential survival dynamic, we analyzed a stochastic model 
that incorporated the standard preferential attachment 
mechanism with preferential survival and showed that 
the power law exponent is preserved even in the face of 
extremely high rate of node deletion. 



As large scale network systems play an increasingly im- 
portant role in our daily lives, the dynamics identified in 
this work could shed light on the empirical observation of 
real-world networks, and could make good candidates to 
be harnessed to engineer network applications. For ex- 
ample, from the perspective of modeling real-world net- 
works, the empirical observation of PS (with awl) and 
a weak DPA mechanism in the crawled web data, can by 
itself explain the resilience of the PL exponent observed 
in the web networks, even though the deletion rates are 
quite high (See Section IIII CI for the analysis results). 
As noted in the introduction, however, a more complete 
modeling of the web data should take into consideration 
the underlying fitness distribution of the pages, and such 
a comprehensive modeling effort is beyond the scope of 
this paper. 

The models studied in this paper could also find ap- 
plications in the design of engineered networks. For ex- 
ample, in order to develop scalable search algorithms for 
large scale peer-to-peer (P2P) networks such as Gnutella, 
researchers have proposed efficient search protocols that 
harnessed and exploited the network's power law degree 
distribution to deliver search hits at a traffic cost that 
scales sublinearly with network size [J, Q . Given the un- 
reliable and ad hoc nature of the nodes in peer-to-peer 
networks, it is important to develop distributed and local 
protocols that will guarantee the maintenance of the net- 
work's power law topology even in the face of extremely 
high rate of node turnovers. One of the solutions pro- 
posed in Q is to introduce a compensatory mechanism 
where existing nodes compensate for lost edges. In this 
work, however, we showed that preferential survival (PS) 
mechanism can stabilize the power law exponent. One 
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potential way to implement PS in a P2P setup would 
be to enforce an incentive mechanism to encourage high- 
degree nodes to remain in the network. For example, 
peers can be rewarded with virtual monetary payment 
in an incremental manner for extending its availability 
in the network; in return, the virtual earnings can then 
be exchanged for services such as priority in downloading 
files. The exact reward function can be tuned to generate 
a preferential survival mechanism with a deletion kernel, 
D(k) ex fc~ Q , with a > 1. The precise implementation 
of the incentive mechanism is beyond the scope of the 
current work and is left for future investigations. 



APPENDIX: FURTHER EMPIRICAL ANALYSIS 
OF WEB DYNAMICS 

Estimating the Deletion Rate. From SET1, we 
found that the deletion rate is quite high c = 0.88 (Fig. 
[5]). We further found that as much as more than 10% 
of the nodes are involved in turnovers (inset Fig. [5]). 
SET2 yields very similar results indicating high rate of 
turnover with the average turnover rate found to be c = 
0.96. However, these figures are overestimates since we 
are taking measurement from a fixed set of web hosts. 

Consider the liberal assumption that the Web grows 
at a rate of about 35% annually (i.e. 3% monthly). This 
assumption implies that the Web will double its current 
enormous size of more than 11 billion nodes in just a 
little over two years. This rough estimate is obtained by 
noting that the number of web hosts has been growing 
at a rate of 25% for the past few years as measured by 
the Netcraft server survey j3fjj . 

Given that the Web grows at a rate of 3% monthly and 
the finding that around 10% of the webpages are removed 
in a month's time, a set of new nodes with a size equal to 
13% of the network size must be inserted to achieve the 
3% monthly growth. These figures translate to a deletion 
rate of c = 10%/13% = 0.77 on the Web. Note that 
measurements from the literature indicates up to 20% 
of webpages are deleted in a month's time [2l|, which 
will imply a even higher deletion rate. In addition, even 
if we assume an unlikely annual growth rate of 100%, 
the deletion rate is still well above 0.5, which is quite 
significant. 

Weak Mechanism of Double Preferential At- 
tachment Edges. On the Web, existing webpages often 
make new links to each other, where these new links are 
found to attach in a double preferential manner (as deter- 
mined from our empirical dataset discussed in the next 
subsection). Under the assumption of uniform deletion, 
we found that even with DPA, the power law exponent 



indicates that the DPA edges are only a small fraction 

1.2, ' ' ' ' ' 



behaves only as 7 = 1 



2(l+b) 



(derived in Sec. IHIBj) . 



l-c+26 

where b is the ratio of the number of DPA edges and pref- 
erential attachment (PA) edges (from a joining node) per 
time step. Thus to get a 7 < 3 for c « 1, b has to be 
> 1; however, our empirical data (both SET1 and SET2) 
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FIG. 5: The monthly turnover rate of the sampled Web 
(SET1), comprising more than 170,000 pages, for the year 
2006. The dashed horizontal line denotes the time average 
turnover rate: c = 0.88. The inset figure shows the number of 
webpages (in thousands) inserted (circles) and removed (tri- 
angles) for each month. More than 10% of the webpages are 
involved in turnovers for most months. 



of the PA edges (i.e., b ps 0.1), and thus DPA by itself 
cannot explain the low power law exponent we observe 
under high turnover rate (e.g. for c = 0.88 and b = 0.1, 
the predicted exponent is 7 = 7.9). In the case of both 
preferential survival and DPA, we use the empirically ob- 
tained values of a = 1 and b = 0.1, and found that the 
power law exponent actually decreases as the network 
experiences higher turnover rate (see Sec. MI Cp . Thus, 
when used in conjunction with the preferential survival 
dynamic, even a weak DPA mechanism (i.e. b = 0.1) is 
critical in driving the power law exponent close to 2 in 
the face of extremely high rate of turnovers. 

Measuring the Preferential Attachment Kernel. 

Although the preferential attachment (PA) model and 
the copying model [1, [l(| are widely accepted as mod- 
els of the Web, relatively few direct measurement studies 
[25l l26l . [27] | has been performed to validate the linear 
preferential attachment kernel generated by these mod- 
els. When a new node joins the existing network and 
attaches edges to existing nodes preferentially, we obtain 
the following attachment kernel: II(fc) oc k, where k de- 
notes the degree of the target node. Using our data set, 
we performed measurements on sets of new nodes that 
appear every month, and confirmed the validity of the 
PA hypothesis (Fig. 



6(a)). Similarly, when a new edge 



emerges and attaches to two existing nodes preferentially, 
we obtain the following double preferential attachment 
(DPA) kernel: n(fci, £2) oc fcifc2, where k\ and k^ denote 
the degrees of the target nodes. We repeat the same mea- 
surement on the set of new edges that attach to existing 
nodes in a given month, and confirmed the validity of the 
DPA hypothesis (Fig. |6(b)>. 
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FIG. 6: (a) Empirical evidence of the preferential attachment of new webpages introduced in February, 2006. The dotted line 
has a slope of 1.9 on a log-log scale in a cumulative function plot, which suggests that the preferential attachment kernel is 
of the form fc ' 9 . The exponent of 0.9 is very close to the exponent of 1.0 from a linear preferential attachment kernel (also 
the attachment kernel obtained by the "copying" mechanism), (b) The figure shows empirical evidence of double preferential 
attachment (data from April, 2006). Since the cumulative function is plotted, a slope of 2 (dotted line) on a log- log scale 
corresponds to double preferential attachment. 
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